Automatic Text Correction for Devanagari OCR
نویسندگان
چکیده
منابع مشابه
Statistical Learning for OCR Text Correction
The accuracy of Optical Character Recognition (OCR) is crucial to the success of subsequent applications used in text analyzing pipeline. Recent models of OCR post-processing significantly improve the quality of OCR-generated text, but are still prone to suggest correction candidates from limited observations while insufficiently accounting for the characteristics of OCR errors. In this paper, ...
متن کاملAn Efficient OCR Error Correction Method for Japanese Text Recognition
OCR error correction using Japanese morphological analysis contains two time-consuming procedures: extraction of candidate words from combinations of candidate characters, and finding the most plausible word sequence in combinations of the candidate words. In this paper an optimal word extraction technique, and the use of lexical entries that are tailored for Japanese verb inflection, are inves...
متن کاملAutomatic Reformatting of OCR Text from Biomedical Journal Articles
The goal of the Medical Article Record System (MARS), being developed by the National Library of Medicine, is to reduce the manual keyboard entry of bibliographic citation fields for the MEDLINE database by automatically identifying and converting information from bitmapped images of biomedical journal article pages to ASCII data. An important element of this automatic conversion requires refor...
متن کاملA Statistical Approach to Automatic OCR Error Correction in Context
This paper describes an automatic, context-sensitive, word-error correction system based on statistical language modeling (SLM) as applied to optical character recognition (OCR) postprocessing. The system exploits information from multiple sources, including letter n-grams, character confusion probabilities, and word-bigram probabilities. Letter n-grams are used to index the words in the lexico...
متن کاملEvaluating OCR and Non - OCR Text
In literature, many feature types and learning algorithms are proposed for document classiication. However , an extensive and systematic evaluation of the various approaches has not been done yet. In order to investigate diierent text representations for document classiication, we have developed a tool which transforms documents into feature-value representations suitable for standard learning ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Indian Journal of Science and Technology
سال: 2016
ISSN: 0974-5645,0974-6846
DOI: 10.17485/ijst/2016/v9i45/106372